Clustering View-Segmented Documents via Tensor Modeling

نویسندگان

  • Salvatore Romeo
  • Andrea Tagarelli
  • Dino Ienco
چکیده

We propose a clustering framework for view-segmented documents, i.e., relatively long documents made up of smaller fragments that can be provided according to a target set of views or aspects. The framework is designed to exploit a view-based document segmentation into a third-order tensor model, whose decomposition result would enable any standard document clustering algorithm to better reflect the multifaceted nature of the documents. Experimental results on document collections featuring paragraph-based, metadata-based, or user-driven views have shown the significance of the proposed approach, highlighting performance improvement in the document clustering task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic-Based Multilingual Document Clustering via Tensor Modeling

A major challenge in document clustering research arises from the growing amount of text data written in different languages. Previous approaches depend on language-specific solutions (e.g., bilingual dictionaries, sequential machine translation) to evaluate document similarities, and the required transformations may alter the original document semantics. To cope with this issue we propose a ne...

متن کامل

Multi-View Subspace Clustering via Relaxed L1-Norm of Tensor Multi-Rank

In this paper, we address the multi-view subspace clustering problem. Our method utilize the circulant algebra for tensor, which is constructed by stacking the subspace representation matrices of different views and then shifting, to explore the high order correlations underlying multi-view data. By introducing a recently proposed tensor factorization, namely tensor-Singular Value Decomposition...

متن کامل

Multi-Camera Visual Surveillance for Motion Detection, Occlusion Handling, Tracking and Event Recognition1

This paper presents novel approaches for background modeling, occlusion handling and event recognition by using multi-camera configurations that can be used to overcome the limitations of the single camera configurations. The main novelty in proposed background modeling approach is building multivariate Gaussians background model for each pixel of the reference camera by utilizing homography-re...

متن کامل

Using Clustering Techniques for on-segmented Language Document Management: A Comparison of K-mean and Self Organizing Map Techniques

Since the number of electronics non-segmented language documents is growing very fast, efficient document clustering techniques for non-segmented languages are needed as a tool in today’s world where a lot of documents are stored and retrieved electronically. It enables one to group the similar documents using keywords or terms of the clusters. Thus document clustering can be used to group and ...

متن کامل

Robust Kernelized Multi-View Self-Representations for Clustering by Tensor Multi-Rank Minimization

Most recently, tensor-SVD is implemented on multi-view self-representation clustering and has achieved the promising results in many real-world applications such as face clustering, scene clustering and generic object clustering. However, tensor-SVD based multi-view self-representation clustering is proposed originally to solve the clustering problem in the multiple linear subspaces, leading to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014